Topic Shift Analysis & Validation

Author

Doron Feingold

Published

September 8, 2025

Overview

Now that we have labels for out topic we can plot the topic shift and review statistics that might help us understand trends across the different models.

These metrics describe the behavior and nature of each topic over time.

Topic Volatility

How “spiky” or “stable” is a topic’s prominence over time?

For each topic, calculate the standard deviation or coefficient of variation of its average gamma scores across all speeches.

Interpretation: A high volatility score indicates an “event-driven” topic that appears intensely and then fades (e.g., war, pandemic). A low score suggests a stable, persistent “background” topic that is always present.

Topic Persistence

Does a topic remain relevant long after its peak, or does it disappear completely? For each topic, calculate a “persistence score” by dividing its average gamma in the most recent year by its all-time peak gamma.

Interpretation: A score close to 1 means the topic is still highly relevant today, even if its peak was long ago. A score close to 0 means the topic has largely faded from the discourse.

Topic Exclusivity

How unique are the words that define a topic? For each topic, count how many of its top 15 words are not in the top 15 words of any other topic in the same model.

Interpretation: High exclusivity suggests a very distinct and well-defined theme. Low exclusivity suggests a more general or foundational topic whose vocabulary is shared across many different policy areas.

4 Topics

We use the 4 topic model for more defined and interpretable topics.
Topic Statistics and Cadence (Yearly)
Topic Peak Year Volatility Persistence Exclusivity
International Affairs & Defense 1878 0.45 0 17
Parliamentary Procedure & Administration 1946 0.37 0 12
Economic Development & Growth 1980 0.32 0 9
Social Policy & Community Welfare 2013 0.34 1 16

The plot is striking. The topics are well defined and a clear pattern of a synchronized raising and declining prominence (stable Volatility). The topic labels seem to fit well with the eras each topic represents.

8 Topics

The 8 topics model is more temporal. Even though there more topics they are evenly exclusive.

Topic Statistics and Cadence (Yearly)
Topic Peak Year Volatility Persistence Exclusivity
Federal-Provincial Coordination 1910 0.45 0.00 12
Parliamentary Business & Trade 1931 0.30 0.00 9
War & International Relations 1944 0.28 0.00 10
Jobs & Economic Policy 1968 0.20 0.00 5
Infrastructure & Regional Development 1976 0.21 0.00 8
Economic Development Programs 1983 0.23 0.40 6
Indigenous Affairs & Climate 2013 0.23 0.40 7
Social Programs & Healthcare 2020 0.24 0.21 8

We now see topic that declined in prominence and later reemerge, like “War & International Relations”. We see periods where topics peek with less prominence. Topics in the early and late years, appear to have strong prominence.

Validation against Major Events

To validate the topics prominence we compare them against major events and governments.

Major Canadian Historical Events

The table below lists partial but uncontroversial, major Canadian historical events marked in the plots below for.

Major Canadian Historical Events Since 1867
Date Event
1867 Confederation: The Dominion of Canada is formed.
1885 Completion of the Canadian Pacific Railway.
1914 Canada enters World War I.
1918 World War I ends.
1929 Great Depression begins.
1939 Canada enters World War II.
1945 World War II ends.
1959 Suez Crisis
1973 Oil Embargo
1982 Canada Act 1982: Patriation of the Constitution.
1995 Widespread adoption of the internet.
2001 Canada joins the Afghanistan War.
2020 COVID-19 pandemic: WHO declares a global pandemic.

Governments

We can also compare the topics prominence against the different governments and see if they “fit”.

Conclusion

This topic modeling analysis of Canadian Speeches from the Throne reveals meaningful patterns in political discourse that align closely with historical events and governmental transitions. The 4-topic model demonstrates clear temporal succession, with Parliamentary Procedure & Administration dominating the early Confederation period (peak 1878), International Affairs & Defense during the world wars (peak 1946), Economic Development & Growth in the post-war boom (peak 1980), and Social Policy & Community Welfare in the modern era (peak 2013). This aligns with its high topic quality (avg_quality_score = 0.696 from prior analysis), despite a lower PCA score (avg_pc1 = -1.989), which reflects high variance in topic strength rather than poor performance. The 8-topic model provides additional granularity, capturing cyclical themes like War & International Relations (peak 1944, resurging during conflicts like the Afghanistan War) and contemporary issues like Indigenous Affairs & Climate (peak 2013). Its lower exclusivity (5–12) is acceptable for its temporal focus, emphasizing nuanced trends over distinct themes.

Validation against major historical events—such as Confederation (1867), World War II (1939–1945), the Great Depression (1929), and the COVID-19 pandemic (2020)—confirms that both models accurately reflect shifts in political priorities. For example, War & International Relations peaks during World War II, and Social Programs & Healthcare surges in 2020. Alignment with governmental priorities is also evident, such as Economic Development & Growth under post-war Liberal governments (e.g., Pearson, 1963–1968) and Social Policy & Community Welfare under recent administrations (e.g., Trudeau, 2015–present). However, low persistence scores (0.00–0.40) suggest many topics are event-driven, fading after their peak, which warrants further exploration for understanding long-term discourse continuity.

These findings demonstrate that unsupervised machine learning can effectively uncover the evolution of political discourse, providing quantitative evidence for how Canadian political priorities have shifted across nearly 160 years of parliamentary debate. The use of Speeches from the Throne as our corpus was particularly valuable, as these ceremonial addresses represent a standardized format delivered at consistent intervals, creating a reliable yardstick for measuring thematic changes over time. Unlike other parliamentary speeches that vary widely in context and purpose, Throne Speeches serve as an index of governmental priorities, allowing us to track shifts in political focus with greater precision and comparability across different eras and administrations.

Things to Consider

This pipeline provides a robust framework for analyzing thematic shifts in Canadian Speeches from the Throne, but several considerations should be noted to contextualize the findings and guide future work:

  • Methodological Limitations:

    • LDA Assumptions: Latent Dirichlet Allocation assumes a bag-of-words model, ignoring word order and context. This may miss nuanced rhetorical patterns in Throne Speeches. Future work could explore contextual models like BERTopic or Structural Topic Models (STM) to incorporate metadata (e.g., era, government party).

    • Metric Subjectivity: Custom metrics like dominance, exclusivity, and efficiency complement standard coherence measures but involve trade-offs. For instance, the k=4 model’s high quality (avg_quality_score = 0.696) contrasts with its lower PCA score (avg_pc1 = -1.989), reflecting high variance in topic strength rather than poor performance. Adding coherence scores (e.g., C_v, U_mass) could standardize evaluation.

    • OCR Errors: Historical speech transcription via Gemini 1.5 Vision may introduce errors due to faded text or non-standard fonts. Manual validation of a sample is recommended to quantify accuracy.

  • Corpus Biases:

    • Government-Centric Perspective: Throne Speeches reflect official government priorities, potentially underrepresenting opposition views or public discourse. Comparing with Hansard debates or media could provide a broader perspective.

    • Language Bias: The corpus appears to be English-only, excluding French-language speeches despite Canada’s bilingual context. This may miss Quebec-specific priorities or cultural nuances. Including French translations (available from parliamentary archives) would enhance inclusivity.

    • Historical Bias: Early speeches (e.g., Pre-War era) may reflect colonial or elite perspectives, underrepresenting Indigenous or marginalized voices. This could bias topics like “Indigenous Affairs & Climate.”

  • Validation and Interpretation:

    • Qualitative Gaps: LLM-generated labels (via Claude) are efficient but risk oversimplification (e.g., combining “Indigenous Affairs” and “Climate”). Human validation via inter-rater agreement (e.g., Cohen’s Kappa) is needed to ensure accuracy.

    • Temporal Sensitivity: Low persistence scores (0–0.4) suggest many topics are event-driven (e.g., War & International Relations peaking in 1944). This may reflect the corpus’s focus on immediate priorities rather than enduring themes, warranting further analysis.

    • Government Alignment: While topics align with historical events (e.g., WWII, COVID-19), government-specific validation lacks detail. Quantifying topic prominence by administration (e.g., Trudeau vs. Mulroney) would strengthen claims.

  • Ethical Considerations:

    • Label Sensitivity: Topics like “Indigenous Affairs & Climate” require careful interpretation to avoid conflating distinct issues or perpetuating stereotypes. Consulting Indigenous scholars could refine labels.

    • Historical Context: Representing 160 years of discourse risks oversimplifying complex socio-political shifts. Annotations or qualitative commentary should accompany quantitative results.

  • Practical Use of Models:

    • The k=4 model is ideal for broad, qualitative analysis of dominant themes (e.g., Social Policy & Community Welfare in the modern era), suitable for historical overviews or policy studies. The k=8 model excels for detailed temporal analysis, capturing cyclical trends (e.g., War & International Relations) and emerging issues (e.g., Indigenous Affairs & Climate). Researchers should run parallel analyses, using k=4 for high-level insights and k=8 for granular trend tracking.

Future Directions

  • Multilingual Analysis: Incorporate French speeches and compare thematic differences.

  • Extended Validation: Use statistical trend tests (e.g., Mann-Kendall) and government metadata for robust validation.

  • Complementary Methods: Integrate sentiment analysis (e.g., via syuzhet) or topic co-occurrence networks to explore tone and relationships.

  • Sensitivity Analysis: Test robustness against different seeds, corpus subsets (e.g., excluding COVID years), or pre-processing choices (e.g., TF-IDF weighting).